FRPA: A Framework for Recursive Parallel Algorithms

نویسندگان

  • David Eliahu
  • Omer Spillinger
  • Armando Fox
  • James Demmel
چکیده

Recursion continues to play an important role in high-performance computing. However, parallelizing recursive algorithms while achieving high performance is nontrivial and can result in complex, hardto-maintain code. In particular, assigning processors to subproblems is complicated by recent observations that communication costs often dominate computation costs. Previous work [1]–[3] demonstrates that carefully choosing which divide-and-conquer steps to execute in parallel (breadth-first steps) and which to execute sequentially (depth-first steps) can result in significant performance gains over naïve scheduling. Our Framework for Recursive Parallel Algorithms (FRPA) allows for the separation of an algorithm’s implementation from its parallelization. The programmer must simply define how to split a problem, solve the base case, and merge solved subproblems; FRPA handles parallelizing the code and tuning the recursive parallelization strategy, enabling algorithms to achieve high performance. To demonstrate FRPA’s performance capabilities, we present a detailed analysis of two algorithms: Strassen-Winograd [1] and CommunicationOptimal Parallel Recursive Rectangular Matrix Multiplication (CARMA) [3]. Our single-precision CARMA implementation is fewer than 80 lines of code and achieves a speedup of up to 11× over Intel’s Math Kernel Library (MKL) [4] matrix multiplication routine on “skinny” matrices. Our double-precision StrassenWinograd implementation, at just 150 lines of code, is up to 45% faster thanMKL for large square matrix multiplications. To show FRPA’s generality and simplicity, we implement six additional algorithms: mergesort, quicksort, TRSM, SYRK, Cholesky decomposition, and Delaunay triangulation [5]. FRPA is implemented in C++, runs in shared-memory environments, uses Intel’s Cilk Plus [6] for task-based parallelism, and leverages OpenTuner [7] to tune the parallelization strategy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Easy-to-use Scalable Framework for Parallel Recursive Backtracking

Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of these computational resources remains a challenge for several application domains. Many parallel algorithms can scale to only hundreds of cores. The limiting...

متن کامل

Reducing Interpolation on Multi-Grid to Quantizing Grid's Data-Base as a Recursion

In his article “Powerlist: A Structure for Parallel Recursion” Jayadev Misra wrote: “Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefix sum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of ...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Systematic Derivation of Tree Contraction Algorithms

While tree contraction algorithms play an important role in efficient tree computation in parallel, it is difficult to develop such algorithms due to the strict conditions imposed on contracting operators. In this paper, we propose a systematic method of deriving efficient tree contraction algorithms from recursive functions on trees. We identify a general recursive form that can be parallelize...

متن کامل

A Programming Methodology for Designing Block Recursive Algorithms on Various Computer Networks

In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms on various computer networks. In our previous works, we propose a programming methodology for designing block recursive algorithms on sharedmemory and distributed-memory multiprocessors without considering the interconnection of processors. We extend the work ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015